{
 "cells": [
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "# EDA example"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 1,
   "metadata": {},
   "outputs": [],
   "source": [
    "try:\n",
    "    from textaugment import EDA\n",
    "except ModuleNotFoundError:\n",
    "    !pip install textaugment\n",
    "    from textaugment import EDA"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 7,
   "metadata": {},
   "outputs": [],
   "source": [
    "t = EDA(random_state=1)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Synonym Replacement\n",
    "Randomly choose *n* words from the sentence that are not stop words. Replace each of these words with one of its synonyms chosen at random"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 8,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "John is going to townspeople\n"
     ]
    }
   ],
   "source": [
    "output = t.synonym_replacement(\"John is going to town\")\n",
    "print(output)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Random Insertion\n",
    "Find a random synonym of a random word in the sentence that is not a stop word. Insert that synonym into a random position in the sentence. Do this *n* times."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 9,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "John is going st john to town\n"
     ]
    }
   ],
   "source": [
    "output = t.random_insertion(\"John is going to town\")\n",
    "print(output)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Random Swap\n",
    "Randomly choose two words in the sentence and swap their positions. Do this *n* times."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 10,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "John to going is town\n"
     ]
    }
   ],
   "source": [
    "output = t.random_swap(\"John is going to town\")\n",
    "print(output)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Random  Deletion\n",
    "Randomly remove each word in the sentence with probability *p*."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 11,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "going to town\n"
     ]
    }
   ],
   "source": [
    "output = t.random_deletion(\"John is going to town\", p=0.2)\n",
    "print(output)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Cite the paper\n",
    "```\n",
    "@article{marivate2019improving,\n",
    "  title={Improving short text classification through global augmentation methods},\n",
    "  author={Marivate, Vukosi and Sefara, Tshephisho},\n",
    "  journal={arXiv preprint arXiv:1907.03752},\n",
    "  year={2019}\n",
    "}```\n",
    "\n",
    "https://arxiv.org/abs/1907.03752"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": []
  }
 ],
 "metadata": {
  "kernelspec": {
   "display_name": "Python 3",
   "language": "python",
   "name": "python3"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 3
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython3",
   "version": "3.7.7"
  }
 },
 "nbformat": 4,
 "nbformat_minor": 4
}
